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Abstract 

A wide variety of problems in machine learning, including exemplar clustering, document 
summarization, and sensor placement, can be cast as constrained submodular maximization 
problems. A lot of recent effort has been devoted to developing distributed algorithms for these 
problems. However, these results suffer from high number of rounds, suboptimal approximation 
ratios, or both. We develop a framework for bringing existing algorithms in the sequential setting 
to the distributed setting, achieving near optimal approximation ratios for many settings in only 
a constant number of MapReduce rounds. Our techniques also give a fast sequential algorithm 
for non-monotone maximization subject to a matroid constraint. 

1 Introduction 

The general problem of maximizing a submodular function appears in a variety of contexts, both in 
theory and practice. From a theoretical perspective, the class of submodular functions is extremely 
rich, including examples as varied as cut functions of graphs and digraphs, the Shannon entropy 
function, weighted coverage functions, and log-determinants. Recently, there has been a great deal 
of interest in practical applications of submodular optimization, as well. Variants of facility loca¬ 
tion, sampling, sensor selection, clustering, influence maximization in social networks, and welfare 
maximization problems are all instances of submodular maximization. In practice, many of these 
applications involve processing enormous datasets requiring efficient, distributed algorithms. 

In contrast, most successful approaches for submodular maximization have been based on se¬ 
quential greedy algorithms, including the standard greedy algorithm [MlIIS], the continuous greedy 
algorithm m I15| . and the double greedy algorithm [7]- Indeed, such approaches attain the best- 
possible, tight approximation guarantees in a variety of settings [I4l[26l[l3], but unfortunately they 
all share a common limitation, inherited from the standard greedy algorithm: they are inherently 
sequential. This presents a seemingly fundamental barrier to obtaining efficient, highly parallel 
variants of these algorithms. 

1.1 Our Contributions 

As demonstrated by the extensive prior works on submodular maximization, the community has a 
good understanding of the problem under remarkably general types of constraints, which are handled 
by a small collection of general algorithms. In contrast, the existing works in the distributed setting 
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are either tailored to special cases, giving approximation factors far from optimal or requiring a large 
number of distributed rounds. One cannot help but wonder if, instead of retracing the individual 
advances made in the sequential setting over the last few decades, it may be possible to obtain a 
generic technique to carry over the algorithms in the sequential setting to the parallel world. 

In this work, we present a significant step toward resolving the above question. Our main 
contribution is a generic parallel algorithm that allows us to parallelize a broad class of sequential 
algorithm with almost no loss in performance. The crux of our approach is a common abstraction 
that allows us to capture and parallelize both the standard and continuous greedy algorithms, and 
it provides a novel unifying perspective for these algorithmic paradigms. Our framework leads to 
the first distributed algorithms that nearly match the state of the art approximation guarantees for 
the sequential setting in only a constant number of rounds. In the following, we summarize our 
main contributions. 

A parallel greedy algorithm. We obtain the following general result by parallelizing the standard 
greedy algorithm: 

Theorem 15.21 Let / : 2^^ —)• M_|_ be a submodular function, and I C 2^ be a hereditary set systen^. 
For any e > 0 there is a randomized distributed 0{l/e)-round algorithm that can he implemented in 
the MapReduce framewor^. The algorithm is an (a — 0{e))-approximation with eonstant probability 
for the problem max^gx/(•S'), where a is the approximation ratio of the standard, sequential greedy 
algorithm for the same problem. 

Our constant number of rounds is a significant improvement over the sample and prune technique 
of |20) , which requires a number of rounds depending logarithmically on the value of the single best 
element. Remarkably, even for the especially simple case of a cardinality constraint, no previous 
work could get close to the approximation ratio of the simple sequential greedy algorithm in a 
constant number of rounds. Our framework nearly matches the approximation ratio of greedy in 
all situations in a constant number of rounds and immediately resolves this problem. 

A parallel continuous greedy algorithm. We obtain new distributed approximation results 
for maximization over matroids, by using a heavily discretized variant of the measured continu¬ 
ous greedy algorithm, obtaining approximation guarantees nearly matching those attained by the 
continuous greedy in the sequential setting. 

Theorem 16.31 Let / : 2^ —^ 1R+ be a submodular function, and X C 2^ be a matroid. For any e > 0 
there is a randomized distributed 0{l/e)-round algorithm that ean be implemented in the MapReduce 
framework. The algorithm is an (a — 0(e))-approximation with constant probability for the problem 
max5gx/(5'), where a is (1 — 1/e) for monotone f and 1/e for general f. 

Improved two-round algorithms and fast sequential algorithms. We also give improved two- 
round approximations for non-monotone submodular maximization under hereditary constraints. 
We make use of the same “strong greedy property” utilized in m but attain approximation guaran¬ 
tees strictly better than were given there. Our algorithm is based on a combination of the standard 
greedy algorithm Greedy and an additional, arbitrary algorithm Alg. Again, we suppose that / is 
a (not necessarily monotone) submodular function and I is any hereditary constraint. In the fol¬ 
lowing theorems and throughout the paper, n := |1/| is the size of the ground set, k := max^gx |>S'| 

set system is hereditary if for any S £T, all subsets of S are also in I. 

^We define the MapReduce model in Section [2] 
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Table 1: New results for distributed submodular maximization. Here A = maxjgy/({i}) and m 
is the number of machines. In the results of m, in the number of rounds, A can be replaced by 
the maximum size of a solution. All algorithms in previous works and ours are randomized and the 
approximation guarantees stated hold in expectation, and they can be strengthened to hold with 
high probability by repeating the algorithms in parallel. 


is the maximum size of a solution, and m is the number of machines employed by the distributed 
algorithm. 

Theorem 17.11 Suppose that Greedy satisfies the strong greedy property with constant 7 and that 
Alg is a fi-approximation for the problem max^gx/(S'). Then there is a randomized, two-round 
distributed algorithm that achieves a (1 — approximation in expectation /or max 5 gx/(S). 

We show that by simulating the machines in this last distributed algorithm, we also obtain 
a fast, sequential algorithm for maximizing a non-monotone submodular function subject to a 
matroid constraint. Our algorithm shows that one can preprocess the instance in O(^logn) time 
and obtain a set X of size Oikje) so that it suffices to solve the problem on X. By using a variant 
of the continuous greedy algorithm on the resulting set X, we obtain the following result. 

Theorem 17.21 There is a sequential, randomized — approximation algorithm for the problem 
max 5 g 2 ;/(S), where X is any matroid constraint, running in time 0{^ log re) -|- poly(|). 

As a final application of our techniques, we obtain a very simple two-round distributed algorithm 
for monotone maximization subject to a cardinality constraint. 

Theorem 17.31 There is a randomized, two-round, distributed algorithm achieving a ^ — e approxi¬ 
mation in expectation for max 5 .| 5 |<;i./(5), where f is a monotone function. 

1.2 Techniques 

In contrast with the previous framework by |20| which is based on repeatedly eliminating bad 
elements, our framework is more in line with the greedy approach of identifying good elements. 
The algorithm maintains a pool of good elements that is grown over several rounds. In each round. 
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the elements are partitioned randomly into groups. Each group selects the best among its elements 
and the good pool using the sequential algorithm. Finally, the best elements from all groups are 
added to the good pool. The best solution among the ones found in the execution of the algorithm 
is returned at the end. The previous works based on 2 rounds of MapReduce such as m can be 
viewed as a single phase of our algorithm. The first phase can already identify a constant fraction 
of the weight of the solution, thus obtaining a constant factor approximation. However, it is not 
clear how to obtain the best approximation factor from such an approach. Our main insight is that, 
with a right measure of progress, we can grow the solution iteratively and obtain solutions that are 
arbitrarily close to those of sequential algorithms. We show that after only 0(|) rounds, the pool 
of good elements already contains a good solution with constant probability. 

1.3 Related Work 

There has been a recent push toward obtaining fast, practical algorithms for submodular maxi¬ 
mization problems arising in a variety of applied settings. Research in this direction has yielded 
a variety of techniques for speeding up the continuous greedy algorithm for monotone maximiza¬ 
tion [3l [22] , as well as new approaches for non-monotone maximization based on insights from both 
the continuous greedy and double greedy algorithms Eli. Of particular relevance to our results 
is the case of maximization under a matroid constraint. Here, for monotone functions the fastest 
current sequential algorithm gives al — 1/e — e approximation using 0(.^^ln^(^) -|- ^) value 
queries. For non-monotone functions, Buchbinder et al. i give an — > 0.283-approximation in 
time 0{knlogn + Mk), where M is the time required to compute a perfect matching on bipartite 
graph with k vertices per side. They also give a simple, combinatorial 1/4-approximation in time 
0{knlogn). In comparison, the sequential algorithm we present here is faster by a factor of Q{k), 
at the cost of a slightly-weaker > 0.211-approximation. 

Work on parallel and distributed algorithms for submodular maximization has been compara¬ 
tively limited. Early results considered the special case of maximum fc-coverage, and attained an 
0(1 — 1/e — e)-approximation |11[ (5]. Later, Kumar et al. |20| considered the more general prob¬ 
lem of maximizing an arbitrary monotone submodular function subject to a matroid, knapsack, 
or p-system constraint. Their approach attains a 2 ^ approximation for matroids, and requires 
O(MogA) MapReduce rounds, where A is the value of the best single element. More generally, 
they obtain a approximation for p-systems in 0(ilog A) rounds. The factor of log A in the 

number of rounds is inherent in their approach: they adapt the threshold greedy algorithm, which 
sequentially picks elements in log A different thresholds. In another line of work, Mirzasoleiman 
et al. |23| introduced a simple, two-round distributed greedy algorithm for submodular maximiza¬ 
tion. While their algorithm is only an 0(^)-approximation in the worst case, it performs very 
well in practice, and attains provable constant-factor guarantees for submodular functions exhibit¬ 
ing certain additional structure. Barbosa et al. m recently gave a more sophisticated analysis of 
this approach and showed that, if the initial distribution of elements is performed randomly, the 
algorithm indeed gives an expected, constant-factor guarantee for a variety of problems. Finally, 
Mirrokni and Zadimoghaddam |21j gave the currently-best 0.545-approximation for the cardinality 
constraint case using only 2 rounds of MapReduce. 

2 The model 

We adopt the most stringent MapReduce-style model among dsi Ezi laig, the Massively Parallel 
Communication (MFC) model from [I] as specified by [2]. Let N be the size of the input. In this 
model, there are M machines each with space S. The total memory of the system is M ■ S = 0{N), 
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which is at most a constant factor more than the input size. Computation proceeds in synchronous 
rounds. In each round, each machine can perform local computation and at the end, it can send at 
most a total of 0{S) words to other machines. These 0{S) words could form a single message of 
size S, S messages of size 1, or any other combination whose sum is at most 0{S). Following |19| . 
we restrict both M,S < The typical main complexity measure is the number of rounds. 

Note that not all previous works on MapReduce-style algorithms for submodular maximization 
satisfy the strict requirements of the MFC model. For instance, as stated, the previous work by 
Kumar et al. |20| uses Q{N log N) total memory and thus it does not ht in this model (though it 
might be possible to modify their algorithms to satisfy this). 

We assume that the size of the solution is at most for some constant 0 < c < 1/2. Thus, 

an entire solution can be stored on a single machine in the model. This assumption is also used in 
previous work such as | 21 ) . 

3 Preliminaries 

A function / : 2^ —)• M+ is submodular if and only if f{AU {e}) — f{A) > /{BU {e}) — f{B) for all 
A C B and e ^ B. If f{A U {e}) — f{A) > 0 for all A and e 0 A we say that / is monotone. Here 
we consider the general problem max{/(S') : S' C K, S' E X}, where T is any hereditary constraint 
(i.e., a downward-closed family of subsets of V). 

Throughout the paper, n '.= \V\ is the size of the ground set, k := max^gx I'S'I is the maximum 
size of a solution, and m is the number of machines employed by the distributed algorithm. 

We shall consider both monotone and non-monotone submodular functions. However, the fol¬ 
lowing simple observation shows that even non-monotone submodular functions are monotone when 
restricted to the optimal solution of a problem of the sort we consider. 

Lemma 3.1. Let f be a submodular function and OPT = arg maxs£X f{S) for some hereditary 
constraint Z. Then, f{A 0 OPT) < f{B O OPT) for all A Q B. 

Proof: Consider X C OPT and e E OPT \ X. By submodularity, f{X U {e}) — f{X) > /(OPT) — 
/(OPT\{e}). On the other hand, because Z is hereditary, OPT\{e} is feasible and thus /(OPT) > 
/(OPT \ {e}). Therefore f{X U {e}) - f{X) > 0 for all X and e E OPT \ X. □ 

Continuous extensions. In this paper, we work with two standard continuous extensions of 
submodular functions, the multilinear extension and the Lovasz extension. The multilinear extension 
of / is the function F : [0,1]^ —>• 1R+ such that F(x) = E[/(i?(x))], where R{x) is a random subset 
of V in which each element e appears independently with probability Xe- 

The Lovasz extension of / is the function f~ : [0, —)• M+ such that /~(x) = ]^)[/({e : 

Xe > ^})]) where U{0, 1) is the uniform distribution on [0,1]. For any submodular function /, the 
Lovasz extension f~ satishes: /“(I 5 ) = f{S) for all S' C K; /“ is convex; and the restricted scale 
invariance property /“(c • x) > c • /~(x) for any c E [0,1]. We shall make use of the following 
lemmas. 

Lemma 3.2 (|12|. Lemma 1). Let S be a random set with E[l 5 ] = c • p (for c E [0,1]/. Then, 
E[/(5)]>c-/-(p). 

Lemma 3.3. Let / : 2^ —>• M_|_ be a submodular function that is monotone when restricted to 
X TV. Further, let T,S C X, and let R be a random subset of T in which every element occurs 
with probability at least p. Then, E[/(i2 U S')] > p ■ f{T U S') -|- (1 — p)f{S). 
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Proof: Recall that / is the Lovasz extension of /. Since / is convex, 


E[/(RU 5)] = E[f-{lnus)] > /-(Efl^us]) = /"(E[1 r\5 ] + I 5 ). 

Since every element of T occurs in R with probability at least p, we have E[l^y 5 ] > p- ItXS- Then, 
since / is monotone with respect to X D S' U T, we must have: 

/ (E[1_r\s] + Is) > / {p ■ 1t\S + Is)- 

Finally, from the definition of /“, we have 

/- {p • 1t\s + 1s)=P- f{T U 5) + (1 - p)f{S). 

□ 

4 Generic Parallel Algorithm for Submodular Mciximization 

In this section, we give a generic approach for parallelizing any sequential algorithm Alg for the 
problem maxgcy: seX / ('S'), where / : 2 ^ —^ M_|_ is a submodular function and X C 2 ^ is a hereditary 
constraint. 

As a starting point, we need a common abstract description of existing sequential algorithms. To¬ 
wards that end, we turn to the standard Greedy and Continuous Greedy algorithms for inspiration. 
The Greedy algorithm directly constructs a solution, whereas the Continuous Greedy algorithm first 
constructs a fractional solution x which is then rounded to get an integral solution. In the common 
abstraction, we will need both the integral solution and the support of the fractional solution x. To 
account for this, we will have the algorithm Alg return a pair of sets, (AlgSol(R), AlgRel(R)), where 
AlgSol(R) S / is a feasible solution for the problem and AlgRel(R) is a set providing additional in¬ 
formation. When using the standard Greedy algorithm for Alg, AlgSol(R) and AlgRel(l/) will both 
be equal to the Greedy solution. When using the Continuous Greedy algorithm for Alg, AlgSol(l/) 
will be the integral solution and AlgRel(R) will be the support of the fractional solution constructed 
by the Continuous Greedy algorithm. 

More importantly, we will need an abstraction that captures the greedy behavior of these algo¬ 
rithms. We encapsulate the crucial properties of greedy-like algorithms in the following definition. 
We believe that this framework is one of the most valuable and insightful contributions of this work, 
and it provides a general abstraction for a broader class of algorithms. 

We assume that the algorithm Alg satisfies the following properties. 

1. (a-Approximation) For every input N V, AlgSol(A) is an a-approximate solution to 

maxgcAT: Sex/('S')■ 

2. (Consistency) Let A and B be two disjoint subsets of V. Suppose that, for each element 
e G R, we have AlgRel(A U {e}) = AlgRel(A). Then AlgSol(A U B) = AlgSol(A). 

Armed with this definition, we can now describe our approach for parallelizing an abstract 
sequential algorithm Alg with almost no loss in the approximation guarantee. 

Parallel algorithm Para I lei Alg based on Alg. As before, let a be the approximation guarantee 
of the sequential algorithm Alg. Let s := max^rcy |AlgSol(A) U AlgRel(A)| be the maximum size 
of the sets returned by Alg. Let e > 0 be the desired accuracy, i.e., we will aim that ParallelAIg 
achieves an [a — e) approximation. 
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The algorithm uses g := 0(l/(Q;e)) groups of machines with m machines in each group (and thus 
the total number of machines is gm). The number m of machines can be chosen arbitrarily and it 
will determine the amount of space needed on each machine, since the dataset is divided roughly 
equally among each of the m machines in each group. An optimal setting is gm := 0{^^n/s). 

The algorithms performs 0(l/e) runs. Throughout the process, we maintain two quantities: an 
incumbent solution S'best; which is the best solution produced on any single machine so far in the 
process, and a pool of elements C Q V (we assume that the incumbent solution is stored on one 
designated machine). 

Each run of the algorithm proceeds as follows. Amongst each group of m machines, we partition 
V uniformly at random; each element e chooses an index i £ [m] uniformly and independently at 
random and is assigned to the ith machine in the group. We do this separately for each group 
of machines, i.e., each element appears on exactly one machine in each group. For an individual 
machine i £ [gm], let Xi^j. denote the set of elements that are assigned to i in run r by this procedure. 
Additionally, we place on each machine the same pool of elements Cr-i, constructed at the end of 
run r — 1. 

0nce the elements have been distributed as described above, on each machine i, we run the 
algorithm Alg on the input Xi^r U Cr-i on the machine to obtain (AlgSol(Aj^r U C^-i), AlgRel(Aj^r U 
Cr-i)). We update the incumbent solution Sbest to be the better of the current solution ^best ^od 
the solutions AlgSol(Aj_rUC'r-i) constructed on each of the machines; this is achieved by having each 
machine send AlgSol(Aj^r U Cr-i) to some designated machine maintaining S'best) this machine 
will update 5best in the next round. We update the pool by setting Cr ■= Cr-i IJ- AlgRel(Aj_rUC'r_i); 
this is achieved by having each machine send AlgRel(Aj^r U Cr-i) to every other machine, and thus 
ensuring that the pool Cr is available on each machine during the next round. 

At the end of the 0(l/e) runs, the algorithm returns the incumbent solution 5best- This com¬ 
pletes the description of our algorithm. 

Avoiding duplicating the dataset. The algorithm above partitions the dataset over 0(l/e) 
groups of machines and thus it duplicates the dataset 0(l/e) times (this problem also applies to 
previous work IS]). This is done in order to achieve the best theoretical guarantee on the number of 
runs, but in practice it is undesirable to duplicate the data. Instead, we can use a single group of m 
machines and perform the computation of a single run sequentially over 0(l/e) sub-run, where each 
sub-run performs the computation of one of the group of machines. This will lead to an algorithm 
that performs 0(l/e^) runs using m machines and it does not duplicate the dataset. 

The analysis. We devote the rest of this section to the analysis of the algorithm ParallelAlg. We 
start by noting that, if we choose g and m so that gm = 0{-\/n/s), the algorithm uses the following 
resources and thus it satisfies the requirements of the model in Section [2l 

Lemma 4.1. ParallelAlg can he implemented in the parallel model in Section\^ using the following 
resources. 

• The number of rounds is 0(l/e). 

• The number of machines is 0{-\/n/s). 

• The amount of space used on each machine is 0{y/ns/{ea)) with high probability. 

• In each round, the total amount of communication from a machine to all other machines is 
0{^/ns/{ea)) with high probability. The total amount of communication over all machines in 
a given round is 0{n/{ea)). 
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Proof: We will choose gm := sjujs as our number of machines. Using this choice, we can provide 
the guarantees stated in the lemma. 

Note that we can combine the update step of the incumbent solution and the pool of a given 
run with the next run’s distribution of elements into a single round of communication. Specifically, 
each machine computes a new random assignment for each element of its sample assigns all of 
its new pool elements to all machines, and sends its solution to the designated machine. Thus each 
run corresponds to a round of communication. In each round, a machine communicates its sample 
which has size 0{njTn) = 0{y/ns/{ea)) with high probability, and the sets AlgSol(Xj^r U C^-i) 
and AlgRel(Xj^r U C^-i) that have size 0(s) to all other machines. Thus the total amount that a 
machine communicates is 0{y/ns/{ea) + s ■ gm) = 0{y/ns/{ea)) with high probability, and the total 
amount that all machines communicate is 0{n + n/m ■ gm) = 0 (n/(ea)). 

In every round, the space used on a given machine is the size of its sample Xi^r, which is 
0{n/m) = 0{^/ns/{ea)) with high probability; the size of the incumbent solution, which is 0{s); 
and the size of the pool, which is 0{gm ■ s/e) = 0{y/ns/e). Therefore the total amount of space 
used on each machine is 0{^/ns/{ea)) with high probability. □ 

Thus it remains to analyze the quality of the solution constructed by the algorithm. In the 
remainder of this section, we show that, if Alg satisfies the a-approximation and consistency prop¬ 
erties defined above, the parallel algorithm ParallelAIg achieves an (a — 0{e)) approximation. For 
simplicity, in this section we assume that Alg is deterministic; in Section [HI we extend our approach 
to the setting in which Alg is randomized. We start by introducing some notation. Let V(l/m) 
denote the distribution over random subsets of V where each element is included independently 
with probability 1/m. Let OPT be an optimal solution. Recall that Xi^j- ^ V(l/m) is the random 
sample placed on machine i at the beginning of run r and is the pool of elements at the 

beginning of run r. The following theorem is the crux of our analysis. 

Theorem 4.2. Consider a run r > 1 of the algorithm. Let Cr-i U V. Then one of the following 
must hold: 

(1) Exi,J/(AlgSol(a_i U Xi^r)) I Cr-I = Cr-i] > (1 - e)^a • /(OPT), or 

( 2 ) E[f{Cr n OPT) I Cr-i = Cr-i] - /(d-1 O OPT) > f • /(OPT). 

Intuitively, Theorem 14.21 shows that, in expectation, if we have not found a good solution on 
some machine after 0(l/e) runs, then the current pool C, available to every machine, must satisfy 
/(Cn OPT) = /(OPT), and so each machine in the next run will in fact return a solution of quality 
at least a/(OPT). The following theorem, whose proof we give in Section [^ makes this formal. 

Theorem 4.3. ParallelAIg achieves an (1 — e)^a approximation with constant probability. 

We devote the rest of this section to the proof of Theorem 14.21 Consider a run r of the algorithm. 
Let Cr-i Q V. In the following, we condition on the event that = Cr-i. 

For each element e € U, let Pr{e) = Prx~v(i/m) G AlgRel(Cr_i U X U {e})] if e € OPT \ C^-i, 
and 0 otherwise. As shown in the following lemma, the probability Pr{e) gives us a handle on the 
probability that e is in the union of the relevant sets. 

Lemma 4.4. For each element e G OPT \ Cr-i, 

Pr[e G Ui<i<gmAlgRel(Cr-i U Xi^r)] = 1 - (1 - Pr{e))^, 

where g is the number of groups into which the machines are partitioned. 


Proof: For each group Gj, we can show that e is not in the union of the relevant sets for that 
group with probability 1 — Pr{e). Since different groups have independent partitions, e is not in 
the union of the relevant sets for all machines with probability (1 — Pr(e))®, and the lemma follows. 
More precisely, for each group Gj, let Yj be the event that e ^ UieG AlgRel((7r_i U Let Gj^i 

denote the ^th machine in Gj. We have 

. m . m 

P' FJ = - E Pr[yj I e is on Gj^i] = — ^ Pr [e ^ AlgRel(a-i U Xe,r) \ e G Xe,r] 

^ t=i ’ m X,,, 

^ 771 

= —Pr [e ^ AlgRel(Cr-i U X U {e})] = 1 — pr(e), 
e=i ^ ’ 

where the hrst equality follows from the fact that e assigned to a machine in Gj chosen independently 
and uniformly at random, and the third from the fact that the distribution of Xg^j. ~ V(l/m) 
conditioned on e G X^^y. is identical to the distribution of X U {e} with X ~ V(l/m). Since the 
events {Yj : 1 < j < g} are mutually independent, Pr[ A j] = 0^=1 = (1 ~ Pr{^)Y ■ D 

Returning to the proof of Theorem 14.2[ we dehne a partition (P^, Q^) of OPT \ as follows: 

Pr = {e G OPT \ Gr-i : Pr{&) < e} Qr = {e ^ OPT \ Gr-i : Pr{&) > e} 

The following subsets of Pr and Qr are key to our analysis (recall that Xi^r is the random sample 
placed on machine i at the beginning of the run r): 

p; = {e G Pr : e ^ AlgRel(Cr-i U Xgr U {e})} q; = Q, n ( uf^i AlgRel(Cr-i U W,r)) ■ 

Note that each element e G Pr is in P{. with probability 1 — Pr(e) > 1 — e. Further, by Lemma [4.41 
each element e G Qr is in Q{ with probability 1 — (1 — Pr{s))^ > 1 ~ ^ P 

It follows from the definition of P{. and the consistency property of Alg that 

AlgSol(a-l U Xgr) = AlgSol(Cr-l U Xqr U Pr). 

Let OPTr_i = Cr-i n OPT be the part of OPT in this iteration’s pool. Then, since Alg is an a 
approximation and P^. U OPTr_i P OPT is a feasible solution, we have 

/(AlgSoKCr-l U Xi,r)) > a • /(p; u OPTr-l). 

Taking expectations on both sides, we have: 

E[/(AlgSol(Cr-l U Xi,r))] > a • E[/(P; U OPTr_l)] > (1 - e)a ■ f{Pr U OPTr-l), (1) 

where the final inequality follows from Lemma [3.31 since / is monotone when restricted to OPT D 
Pr U OPTr-l, and P{. contains every element of Pr with probability at least (1 — e). 

Note that Q{ C (OPT 0 Gr) \ OPTr_i. As before, / is monotone when restricted to OPT. 
Additionally, contains every element of Qr with probability at least 1/2. Thus, 

E[f{Gr n OPT) I Cr-l = Cr-l] > E[f{Q{ U OPTr-l)] > ^ • /(Qr U OPTr-l) + ^ • /(OPTr-l), 
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where the final inequality follows from Lemma [3.31 Rearranging this inequality using the condition 
Cr-i = Cr-i and the definition OPT^-i = Cr-i H OPT we obtain: 

E[f{Cr n OPT) - f{Cr-l n OPT) | Cr-l = Cr-l] > ^ (f(Qr U OPT,_i) - /(OPT,_i)) 

> ^ (fi-Pr UQrU OPT,_i) - f(Fr U OPT,_i)) = ^ (/(OPT) - f(Pr U OPT,_i)) , (2) 

where the second inequality follows from submodularity. 

Now, if /(iTU(C'r-inOPT)) > (1—e)-/(OPT) then this fact together with ([T]) imply that the first 
property in the statement of Theorem 14.21 must hold. Otherwise, /(OPT) — f{Pr U (C^-i O OPT)) > 
e • /(OPT); this fact together with ([2]) implies that the second property must hold. 

This completes the description of our generic approach. In the following sections, we instantiate 
the algorithm Alg with the standard Greedy algorithm and a heavily discretized Continuous Greedy 
algorithm, and obtain our main results stated in the introduction. 

5 A Parallel Greedy Algorithm 

In this section, we combine the generic approach from Section U] with the standard greedy algorithm, 
and give our results for monotone maximization stated in Theorem 15.21 

We let Alg be the standard Greedy algorithm. We let AlgRel(A) = AlgSol(A) = Greedy (A). It 
was shown in previous work that the Greedy algorithm satisfies the consistency property. 

Lemma 5.1 ([l2]. Lemma 2). Let A CV and B P V be two disjoint subsets ofV. Suppose that, 
for each element e £ B, we have Greedy(A U {e}) = Greedy(A). Then Greedy (A U B) = Greedy(A). 

Informally, this simply means that if Greedy rejects some element e when presented with input 
A U {e}, then adding other similarly rejected elements to A U {e} cannot cause e to be accepted. 
This allows us to immediately apply the result from Section |4] and obtain the following result. 

Theorem 5.2. Let / : 2^ —>■ M_|_ be a submodular function, and I be a hereditary set system. 
For any e > 0 there is a randomized distributed 0{l/e)-round algorithm that can be implemented in 
the model described in Section\^ The algorithm is an (a — 0(e))-approximation with constant prob¬ 
ability for the problem maxs^i f{S), where a is the approximation ratio of the standard, sequential 
greedy algorithm for the same problem. 

6 A Parallel Continuous Greedy Algorithm 

For monotone maximization subject to amatroid constraint, Theorem l5.2| guarantees only a (1/2 —e) 
approximation, due to the limitations of the standard greedy algorithm. We obtain a nearly optimal 
(1 —1/e —e) approximation by instantiating the framework in Section3]with the DCGreedy algorithm 
shown in Algorithm [T] 

The DCGreedy algorithm is a heavily discretized version of the measured continuous greedy ap¬ 
proach of |15| . and it first constructs an approximate fractional solution to the problem maXxgp(x) F(x.) 
of maximizing the multilinear extension F of f subject to the constraint that x is in the matroid 
polytope P(I), and then rounds the fractional solution without loss using pipage rounding or swap 

rounding mm- 

In this section, we combine the generic approach from Section U] with the DCGreedy algorithm. 
We use DCGreedy as Alg; the relevant set AlgRel(A) is the set of elements in the support of the 
fractional solution x(l/e), and AlgSol(A) is the integral solution obtained by rounding x(l/e). 
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Algorithm 1: Discretized Con¬ 
tinuous Greedy (DCGreedy). 
Input: N CV 

1 x(0) ^ 0 

2 for t 1 to 1/e do 

3 y(t) ^ GreedyStep(A, x(t)) 

4 _ x(t) ^ x(t - 1) +y{t) 

5 5SwapRounding(x(l/e),X) 

6 Let T be the support of x(l/e) 

7 return (S', T) 


Algorithm 2: Greedy Update Step (GreedyStep). 


Input: A C U, X S [0,1] 

1 lU •(— 0, y ^ 0 

2 repeat 


N 


3 

4 

5 


D ^ {e£ N\W :WU{e} £1} 

foreach e € D do 

\_We^ E[/(i?(x + y)U {e}) - f{R{x + y))] 


6 

7 

8 
9 


Let e* = argmaXegD'fi’e 

if D = 0 or We* < 0 then return y else 

Ve* ^ Ve* + e(l - Xe*) 

W^WU{e*} 


Figure 1: The discretized continuous greedy algorithm. On line 5 of Algorithm [2l for a vector 
z G [0,1]'^, we use R{z) to denote a random subset of N that contains each element e independently 
with probability The weights on line 5 cannot be computed exactly in polynomial time, but 
they can be efficiently approximated using random samples. 


Note that it is necessary to ensure that the fractional solution has small support so that the size of 
AlgRel(A) is small. We achieve this by heavily discretizing the continuous greedy algorithm, thereby 
limiting the number of support updates performed in lines 3 and 4 of DCGreedy. Unfortunately, 
performing this discretization naively introduces an error in the approximation that is too large. 
Thus, we make use of a key idea from [3], which can be applied in the case of a matroid constraint. 
This allows us to show the following lemma whose proof is deferred to Section O 

Lemma 6.1. The DCGreedy algorithm achieves an (1 — 1/e — 0(e)) approximation for monotone 
functions and an (1/e — 0(e)) approximation for non-monotone functions. 

The lemma above provides us with the desired approximation guarantees for DCGreedy, and 
thus it remains to show the consistency property. Before doing so, we must address how the weights 
are computed on line 5 of the GreedyStep algorithm (see Algorithm [2|). Gomputing the weights 
exactly requires exponential time, but they can be approximated in polynomial time using random 
samples. In order to illustrate the main ideas behind the proof of consistency, we assume that the 
weights are computed exactly, since this will keep the algorithm deterministic. In the Appendix, we 
remove this assumption and we analyze the resulting randomized algorithm using an extension of 
our framework. 

Lemma 6.2. Let A and B be two disjoint subsets ofV. Suppose that, for each element e £ B, we 
have DCGreedyRel(A U {e}) = DCGreedyRel(A). Then DCGreedySol(A U i?) = DCGreedySol(A). 

Proof: We will show that the GreedyStep algorithm picks the same set W on input (A,x) and 
(AU i?,x), which implies the lemma. Suppose for contradiction that the algorithm makes different 
choices on input (A,x) and (A U B,x). Gonsider the first iteration where the two runs differ, and 
let e be the element added to W in that iteration on input (AU B,x.). Note that e ^ A and thus we 
have e £ B. But then e will be added to W on input (AU {e} ,x). Thus e £ DCGreedyRel(AU {e}), 
which contradicts the fact that e £ B. □ 
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Thus we can apply the result from Section 0] and obtain the following result. 

Theorem 6.3. Let / : 2^ —)• M+ be a submodular funetion, and X 2^ be a matroid. For any 
e > 0 there is a randomized distributed 0{l/€)-round algorithm that can be implemented in the model 
deseribed in Section\^ The algorithm is an [a — 0{e))-approximation with constant probability for 
the problem maxs^x f{S), where a is [1 — 1/e) for monotone f and 1/e for general f. 

7 Faster Algorithms 

In this section, we build on the techniques from the previous sections to give a distributed algorithm 
for non-monotone maximization that requires only two rounds, rather than 0 (l/e) rounds, and 
achieves an improved approximation guarantee over the two-round algorithm proposed in |12| . In 
the case of non-monotone maximization over a matroid, we show that our techniques can be used 
to obtain a new, fast sequential algorithm as well. 

7.1 Two-Round Algorithms For Non-Monotone Maximization 

We first give an improved two-round algorithm for non-monotone maximization subject to a any 
hereditary constraint. The algorithm is similar to that of m for monotone maximization; perhaps 
surprisingly, we show that this approach achieves a good approximation even for non-monotone 
functions. We randomly partition the elements onto the m machines, and run Greedy on the 
elements V) on machine i to pick a set Si. We place the sets Si on a single machine and we run any 
algorithm Alg on B := IJ- Si to find a solution T. We return the best solution amongst Si,, Sm, T. 

We analyze the algorithm for any hereditary constraint X for which the Greedy algorithm satisfies 
the following property (for some 7 ), which we refer to as the strong greedy property: 

V5 e X ; /(Greedy(F)) > 7 • /(Greedy(F) U S) (GP) 

By the standard Greedy analysis, we have 7 = 1/2 for a matroid constraint and 7 = l/(p -|- 1) 
for a p-system constraint. 

Theorem 7.1. Suppose that Greedy satisfies the strong greedy property with constant 7 and let 
Alg be any fd-approximation for the problem maxs^x f{S). Then there is a randomized, two-round 
distributed algorithm that aehieves a (1 — approximation in expectation for uiaxs^x f{S). 

Proof: For each element e, we let probability pe = Pi’x~v(i/m)£ Greedy(A U {e})], if e G OPT, 
and 0 otherwise. Then, let p G [0, l]'^ denote the vector whose entries are given by the probabilities 
Pe- 

We first analyze the expected value of the Greedy solution Si. Let 

O = {e G OPT: e i Greedy(yi U {e})} . 

By Lemma [5Tl Greedy(Vi U O) = Greedy(Vi) = Si, and by (IGPI) . /(Si) > 7 • /(Si U O). Therefore 

E[/(Si)] > 7 -E[/(SiUO)] 

= 7-E[/-(l5iuo)] 

> 7 • /~(E[lsiuo]) 

= 7 • /~(E[lsi] + (loPT - p))- (3) 

On line three, we have used the fact that f~ is convex and on line four we have used the fact that 
E[lsiUo] = E[l 5 j -I- (loPT - p)- 
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Now consider the solution T. Since Alg is a /3-approximation, we have 

E[/(r)] >/3 • E[/(i3 n OPT)] 

= /3 • E[/~(lBnOPT)] 

> /3 • /“(E[lsnOPT]) 

= / 3 -r(p). ( 4 ) 

Similarly to above, we have used the convexity of f~ and the fact that E[lBnOPT] = P- 
By combining ([3]) and (|4]), and using convexity of /“, we obtain 

t E[/(Si)l + i E|/(r)] > /-(Ell&l + (loPT - P)) + /-(p) > 2 . /- ^ E[1 s.] + 1opt 

Since Si C Vi and Pi is a l/m sample of V, we have E[l 5 j] < ^ • ly. Therefore, using the definition 
of /“ and the non-negativity of /, we obtain 

2 • r /(OPT). 

Thus 

max{E[/(5i)],E[/(r)]} > (l - ^ • /(OPT). 

□ 

Examples of results. We conclude this section with some examples of approximation guarantees 
that we can obtain using Theorem 17.11 For a matroid constraint, we have 7 = 1/2 and, if we use 
the measured Continuous Greedy algorithm for Alg, we have /3 = 1/e; thus we obtain a (l — 
approximation. We remark that, for a cardinality constraint, one can strengthen the proof of 
Theorem l7.1l slightlv and obtain a (l — ;^) ^ (l — ^) approximation; we give the details in Section I dI 
For a p-system constraint, we have 7 = l/(p -|- 1). We can use the algorithm of Gupta et al. |18) 
for Alg that achieves an approximation /3 = 3/ -|-4-|- when combined with the algorithm 

of [7] for unconstrained non-monotone maximization. Thus we obtain a 3 (l — + 7 + |^ 

approximation. 

7.2 A Fast Sequential Algorithm for Matroid Constraints 

We now show how our approach can be used to obtain a fast sequential algorithm for non-monotone 
maximization subject to a matroid constraint. The analysis given in Theorem 17.11 only relies on 
the following two properties of the Greedy algorithm: it satisfies (jGPjl and Lemma l5.ll Thus we 
can replace the Greedy algorithm by any algorithm satisfying these two properties. In particular, 
the Descending Thresholds Greedy (shown in Figure [2] as DThreshGreedy) of |201 [3] satisfies these 
conditions with 7 = 1/2 — e. 

Our algorithm proceeds as follows. We randomly partition the elements into m := 1/e samples 
Pi, V 2 ,..., 1/n- On each sample, we run the Descending Thresholds Greedy algorithm on p to 
obtain a solution Si. Let A := argmaxjgj^j f{Si) and B := Ui'S'j. Then, \B\ < k/e, where k is the 
rank of the matroid. We run any /3-approximation algorithm Alg on B to find a solution B', and 
we return the better of A and B'. We obtain the following result. 
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Algorithm 3: Descending Thresholds 
Greedy algorithm (DThreshGreedy). 


Input: N CV 

S d-^ maXegAT /({e}) 
for w = d; w > ^d; w •(— r(;(l — e) do 

foreach e G A do 

if 5 U {e} G X 

and f{S U {e}) — f{S) > w then 

L 5^ 5u{e} 


7 return S 


Figure 2: The Descending Thresholds Greedy algorithm of IMllS]. 


Theorem 7.2. There is a sequential, randomized — approximation algorithm for the problem 
max^gx/((S'), where X is any matroid constraint, running in time 0{^ log re) + poly(^). 

Proof: The running time of the Descending Thresholds Greedy algorithm on a ground set of size 
s is 0(|log(|)). Each random sample has size 0{en) with high probability, and thus the total 
time needed to construct B is O(jlogre) with high probability. It follows from the analysis in 
Theorem O that the best of the two solutions A and a /3-approximation to max5cB:Sex/('S') is a 
—^ — e approximation. We can then use any 1/e-approximation algorithm as Alg. □ 

7.3 Monotone Maximization with a Cardinality Constraint 

Here, we show how the previous techniques give a simple two-round algorithm that achieves a 1/2 —e 
approximation for monotone maximization subject to a cardinality constraint. 

Theorem 7.3. There is a randomized, two-round, distributed algorithm achieving a ^ — e approxi¬ 
mation in expectation for max 5 .| 5 |<;i./(5), where f is a monotone function. 

The algorithm. Let e > 0 be a parameter. The algorithm uses 0(log(l/e)/e) groups of machines 
with rre machines in each group (and thus the total number of machines is 0(mlog(l/e)/e)). 

We randomly distribute the ground set V to the machines as follows. Amongst each group of rre 
machines, we partition V uniformly at random; each element e chooses an index i G [rre] uniformly 
and independently at random and is assigned to the rth machine in the group. We do this separately 
for each group of machines, i.e., each element appears on exactly one machine in each group. 

We run Greedy on each of the machines to select a set of k elements. Let S be the union of all of 
the Greedy solutions. We place S on a single machine together with a random sample X ~ V(l/rre). 
On this machine, we pick the final solution as follows. For each value a G {0,1,... , k}, we select 
a solution Ta as follows. Let Greedy(/, A, a) denote the first a elements chosen from the random 
sample X using the greedy algorithm on objective function /. 

Then, let T^ = Greedy(/, A, a) and define g{A) = f{T^ U A) — f{T^) for each A T V. Note 
that g is a non-negative, monotone submodular function. Let T^ = Greediy{g, S, k — a); that is, 
we pick k — a elements from S using the Greedy algorithm with the function g as input. We set 
Tq = T/ U Tf. The final solution T is the better of the k -\- 1 solutions Ta, where a G {0,1,..., A:}. 
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The analysis. In the following, we show that the algorithm above is a 1/2 — e approximation. For 
each element e, we define a probability pe = Prx~v(i/m)[c S Greedy(X U {e})], if e £ OPT and 0 
otherwise. We define a partition (0i,02) of OPT as follows: 

Oi = {e G OPT \pe<e}, O 2 = {e G OPT \ Pe > e} . 


Let a = I Oi I and let 

O'l = {e G Oi 1 e ^ Greedy(/, X U {e} , a)} . 
By the consistency property of the greedy algorithm (Lemma 15.ip . 

Ta = Greedy(/, X, o) = Greedy(/, X U 0[,a). 


Additionally, for a cardinality constraint, Greedy satisfies (IGPj) with 7=1/2 (see Subsection 17.11) . 
Therefore 

f{T^)>\f{T^UO[), (5) 

9{T!)>^g{T^U{02nS)). ( 6 ) 

The inequality ([6|) can be rewritten as 

f{T^ u Tf) - /(Ti) > i(/(ri u r2 u (O 2 n s)) - /(r^)). (7) 

Adding ([5]) and ([71), we obtain 


f{Ta) > ^(/(Ti U 01) + /(Ti u r2 u (O 2 n 5)) - /(Ti)) 
> u 01 u (O 2 n S)) 

>i/(0lu(02ns)). 


where the last two inequalities follow from submodularity and monotonicity. Note that each element 
e G Oi is in Ol with probability 1 — Pe ^ 1 — e. Each element e G O 2 is in the union of the Greedy 
solutions from a given group of machines with probability Pe > c, since there are 0(log(l/e)/e) 
groups of machines and the groups have independent partitions, e is in S' with probability at least 
1 — e. Therefore 

E[loiu(02ns)] > (1 - e)loPT- 


Thus 

E[/(rj] > \f-imo[uiO,nS)]) > (1 - e)^/(OPT). 


In the last inequality, we have used that if x > y component-wise and / is monotone, / (x) > f (y). 
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A Proof of Theorem 14.31 

Theorem 14.31 ParallelAIg achieves an (1 — e)^a approximation with constant probability. 

Proof: Let i? = c/e be the total number of runs, and C = (Cq, Ci,..., C/j). Let Iy.{Cr-i) G {0,1} 
be equal to 1 if and only if 

Exi,J/(AlgSol(a_i U Xi,,))] > (1 - efa • /(OPT). 


Let 


^r{C) = Ir{Cr-l) + 
R 


2{f{Cr n OPT) - f{Cr-i n OPT)) 
e/(OPT) 




r=l r=l 

Taking expectation over the random choices of C, we have 


r=l 


BcmC)]<^B[Ir{Cr-l)] + -^ 

r=l 

On the other hand, by Theorem 14.21 E[<i>p(Cr_i)] > 1 and therefore E[$(C)] > R. Thus 

2 ^ 

- + ^ E[Ir{Cr-i)] > 4>(C) > R. 

r=l 


Since R > 6/e, we have 

R 

^E[C(C._i)]>^. 

r=l 

Therefore, with probability at least 2/3, there exists a run r such that Ij.(Cr-i) = 1- Fix the 
randomness up to the first such run, i.e., condition on a hxed Cr-i = Cr-i such that R^Cr-i) = 1 
and Cp,..., Cji remain random. Assume for contradiction that with probability at least 1 —ea(l —e)^ 
over the choices of Xi^r, 

/(AlgSol(a_i U Xi,p)) < (1 - efa • /(OPT). 


Then we have 


E[/(AlgSol(a-i u Xi,p))] < (ea(l - e)^ + (1 - ea(l - e)2)(l - e)3a)/(OPT) 

= (e + (1 - ea(l - e)2)(l - e)) (1 - e)2a/(OPT) 
<(l-e)V(OPT), 

contradicting our assumption on Cr-i- Thus, with probability at least ea(l — e)^, we have 

/(AlgSol(a-i u Xi,p)) > (1 - e)3a • /(OPT). 

Notice that the above argument applies not only to machine 1 in run r but also the first ma¬ 
chine in each of the g groups in the same run r and their random samples Xi^r are independent. 
Thus, since g > c/{ea) for a sufficiently large constant c, with probability at least 5/6, we have 
maxj /(AlgSol(C'p_i U Xi^r)) > (1 ~ e)^a ■ /(OPT). Overall, the algorithm succeeds with probability 
at least 2/3 • 5/6 = 5/9. □ 
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B A Framework for Parallelizing Randomized Algorithms 

In this section, we extend the framework from Section U] to the setting in which the sequential 
algorithm Alg is randomized. 

We represent the randomness of Alg as a vector b ~ 2? drawn from some distribution T>. It is 
convenient to have the randomness b given as input to the algorithm. More precisely, we assume 
that Alg takes as input a subset N C V and a random vector b ^ P and returns a pair of sets, 
AlgSol(A, b) and AlgRel(A, b). We assume that the size of b depends only on the size of V, and 
hence is independent of the size of N. Finally, we assume that Alg has the following properties. 


1. ((a, e, (5)-Approximation) Let OPT = argmaxg^^: scv fi^) an optimal solution over the 
entire ground set V. Let A C V and b ~ P. Let B C OPT be a subset such that, for each 
e G P, e ^ AlgRel(A U {e} , b). We have 


Pr 


/(AlgSol(A U P, b)) > a • /((A U P) 0 OPT) 


e/(OPT) 


> 1 


<5. 


2. (Consistency) Let b be any fixed vector. Let A and P be two disjoint subsets of V. Suppose 
that, for each element e G P, we have AlgRel(A U {e} ,b) = AlgRel(A,b). Then AlgSol(A U 
P,b) = AlgSol(A,b). 

Note that our assumption that the length of b is independent of the size of the input subset 
allows expressions such as Alg(A U {e} , b) and Alg(A, b) to both make sense despite the fact that 
|AU {e} I / |A|. 

Our algorithm works exactly as that described in Section U with the exception that each machine 
i in round r now additionally samples a random vector bj^^ ~ P. Then, on each machine, we 
run Alg on the set Vi^r '■= ^i,r U C of elements on the machine and obtain AlgSol(Vi,r; bi,r) and 
AlgRel(V),r; bj^r)- As in Section^ the union |J^ AlgRel(V)^r; R^^) of relevant elements is added to C, 
and the solution S'best is replaced by the best solution among {AlgSol(I^^r, bj^,.): 1 < z < M} and 

‘S'best • 

In the final round we place C on a single machine, sample a random vector b ~ P, and run Alg 
on C, and b to obtain the solution AlgSol(C',b). The final solution is the best among AlgSol(C',b) 
and S'best- 

Analysis. The number of rounds, number of machines, space per machine, and amount of com¬ 
munication are the same as in Section |4l Thus, we focus on the approximation guarantee of the 
parallel algorithm. Using Theorem IB. II instead of Theorem 14.21 we can then finish the analysis in 
almost the same way as the deterministic case. The only difference is that instead of arguing that 
the algorithm works well with most of the random choices for Xi^r as before, the proof now argues 
that the algorithm works well with most of the random choices for both Xi^r and bj^^- Nonetheless, 
the same proof except for this substitution works. 

Theorem B.l. Consider a run r > 1 of the algorithm. Let Cr-i C V. Then one of the following 
must hold: 

( 1 ) Exi,,,bi,J/(AlgSol(a-i u Xi,„bi,,)) I Cr-i = dr-i] >{a- 0(e)) • /(OPT), or 

(2) E[f{Cr n OPT) I Cr-i = d-i] - f{Cr-i D OPT) > f • /(OPT). 
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Proof: Consider a run r of the algorithm. Let Cr-i C V. In the following, we condition on the 
event that Cr-i = Cr-i- 

For each element e £ V, let Prie) = Pi'x~v(i/m),b~i’[c £ AlgRel(C'r-i U X U {e} ,b)], if e € 
OPT \ Cr-i, and 0 otherwise. The proof of the following lemma is exactly the same as Lemma [4.41 
and thus is omitted. 

Lemma B.2. For each element e G OPT \ Cr-i, 

Pr[e G Ui<j<gmAlgRel(Cr-i U Xi^r, K,-)] = 1 - (1 - Pr{e)f, 
where g is the number of groups into which the machines are partitioned. 

We dehne a partition {P^., Q^) of OPT \ as follows: 

Pr = {e £ OPT \ Cr-i : Pr{e) < e}, Qr = {e £ OPT \ Cr-i : Prie) > e}. 


The following subsets of Pr and Qr are key to our analysis ( recall that Xi^r is the random 
sample placed on machine i at the beginning of the run and bj^,. is the random vector sampled by 
machine i in round r): 

p; = {e G p.: e ^ AlgRel(d-i U Xp, U {e} , bp,)}, Q} = Q, n ( AlgRel(d-i U bp,)). 

Note that each element e G P, is in P^ with probability 1 — p,(e) > 1 — e. Further, by Lemma [B.2I 
each element e G Q, is in Q} with probability 1 — (1 — p,(e))^ > 1 — ^ 

It follows from the dehnition of P^. and the consistency property of Alg that 

AlgSol(C',_i U Xi^,, bi^,) = AlgSol(C',_i U Xi^, U P^, bi^,). (8) 

Let OPT,_i = Cr-i n OPT be the part of OPT in this iteration’s pool. We apply the (a, e, 5)- 
approximation property with A = C,_i U Xi^,, b = bi^,, and P = P^ to obtain 


Pr 

bl,r 


AlgSol(C',_i U Xi^, U Pr,hi^r) 


> a • /((C,_i u Xp, u p;) n OPT) 


e/(OPT) 


> 1 


5 . 


Since / is monotone when restricted to OPT, and P^ U OPT,_i C (Cr-i U Xi^, U P^) O OPT, this 
inequality implies that 


Pr 

bl,r - 


AlgSol(C,_i U Xp, U p;, bp,) > a • /(p; U OPT,_i) - e/(OPT) 


> l-,5. 


Therefore, equation Q gives 


Pr 


AlgSol(C,_i U Xp„ bp,) > a • /(P; U OPT,_i 


- e/(OPT) 


> 1 


5. 


Taking expectation on both sides gives 

IExi,„fei,,[/(AlgSol(d_i U Xp„bp,))] > (1 - d)a • Exi,,[/(P; U OPT,_i)] - e/(OPT) 

> a • [/(P; U OPT,_i)] - (e + a,5)/(OPT) 

> (1 - e)a ■ /(P, U OPT,_i) - (e + ad) f (OPT) 

> (1 - e)a • /(P, U OPT,_i) - 2e/(OPT). 


( 9 ) 
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Here, the second inequality follows from the fact that / is monotone restricted to OPT 3 U 
OPT^-i), the third from Lemma 13.31 and the fact that every element of Pr appears in P^. with 
probability at least (1 — e), and the last from our assumption that a6 < e. 

Next, note that Q'^ C [Cr O OPT) \ OPTj._i. This together with monotonicity of / restricted 
to r imply: 

E[/(a n OPT) I Cr-i = dr-i] > e[/(q; u opt,_i)] 

> ^ • f{Qr u OPT,_i) + i • /(OPT,_i), 

where the last inequality follows from Lemma 13.31 and Lemma IB.21 Rearranging this inequality 
using the condition Cr-i = Cr-i and the definition OPT^-i = O OPT we obtain: 

B[f{Cr n OPT) - f{Cr-i n OPT) | a-i = ^-i] > ^ {f{Qr u OPT,_i) - /(OPT,_i)) 

> i {f{Pr UQrU OPT,_i) - f{Pr U OPT,_i)) 

= i (/(OPT) - f{Pr U {Cr-l n OPT))) , (10) 

where the second inequality follows from submodularity. 

Now, if f{Pr U {Cr-i n OPT)) > (1 — e) • /(OPT) then this fact together with ([9]) imply the first 
property in the statement of Theorem IB . 1 1 must hold. Otherwise, /(OPT) —/(PrU(Cr-inOPT)) > 
e • /(OPT); this fact together with (HOI) imply that the second property must hold. □ 

C Analysis of DCGreedy for the Application of the Randomized Framework 

In this section, we show that we can instantiate the randomized framework from Section m with a 
modified DCGreedy algorithm, and obtain the results stated in Section |6l Specifically, we extend 
the DCGreedy algorithm to the setting in which the weights Wg on line 5 of GreedyStep are evaluated 
approximately via samples. The resulting GreedyStep is shown in Algorithm [H 
We devote the rest of this section to proving the following result. 

Theorem C.l. The modified DCGreedy algorithm with approximate evaluation ofwe’s satisfies the 
consistency property and the (a, e, 6)-approximation property with 6 = 1/n and a = 1/e — 0(e) for 
non-monotone functions and 0 = 1 — 1/e — 0(e) for monotone functions. 

We begin by verifying that the consistency property holds. Consider a vector b and two subsets 
A,B TV such that, for each element e T B,we have DCGreedyRel(Au{e} , b) = DCGreedyRel(A, b). 
We shall show that the approximate GreedyStep algorithm always picks the same set W on input 
(A, X, b) and [A U B, x, b). (Note that, since the two runs have the same randomness b, they will 
use the same approximate weights.) Suppose for contradiction that the algorithm makes different 
choices on input (A, x, b) and (AUi?,x, b). Consider the first iteration where the two runs differ, 
and let e be the element added to W in that iteration on input {A U i?,x, b). Note that e ^ A 
and thus we have e T B. But then e would be added to W on input (A U {e} ,x, b), as well. Thus 
e G DCGreedyRel(AU {e} , b), which contradicts the fact that e € B. Thus the consistency property 
holds. 

Now we verify that the (a, e, (5)-approximation property holds. The analysis of the modihed 
DCGreedy algorithm is similar to the analyses in Hals]. 
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Algorithm 4: Greedy Update Step (GreedyStep) with randomized 
approximation of Wg’s. 

Input: A C U, X G [0,1]'^ 

1 IU^0 

2 y 

^ 0 

3 repeat 

4 

Let D ^ {e G A\IU: IF U {e} G X} 

5 

Pick i = Q ^ ^ ^^ independent random samples for R(x + y) 

6 

foreach e G D do 

7 


We approximation of E[/(i7(x + y) U {e}) — f(R(x + y))] 



via above samples 

8 

Let e* = argmaXegz){iCe} 

9 

if D = 0 or We* < 0 then 

10 


return y 

11 

else 

12 


Ve* ^ 2/e* + e(l - Xe*) 

13 


W ^WU{e} 





Figure 3: Discretized continuous greedy (DCGreedy) with approximate evaluation of the 
line 7. The weight We is estimated as follows. Given i independent random sets i?i,..., 
samples for i?(x + y)), We is set to | Yfi=i{f{Ri U {e}) - f{Ri))- 


We’s on 
(the 


Lemma C.2. Let X be matroid on V and OPT = argmax^g^: scv fi^)- A X V and b ~ P. 
Let B C OPT be a subset such that, for each e € B, e ^ DCGreedy(AU {e} ,b). Then, we have 

Pr [F(DCGreedy(A U P, b)) > a • f{{A U P) O OPT) - e • /(OPT)] > 1 - 1/n, 

where a = (1 — 1 /e — 0(e)) for monotone f and ( 1 /e — 0(e)) for general f. 

In the remainder of this section, we prove Lemma IC.21 If x, y G [0,1]'^, we denote by x V y 
the vector such that (x V y)j = max{xj,yj}. Similarly, x A y is the vector such that (x A y)j = 
min{x,,yj}. Let OPT = argmax^gyggj/(S') be an optimal solution over the entire ground set V, 
and consider the execution of DCGreedy(AuOPT, b). Let Z be the set of vectors that DCGreedy(^U 
OPT,b) considers when computing the weights of elements, i.e., the set of all vectors z := x + y, 
where x = x(t) for some iteration t of DCGreedy(A U OPT, b) and y is the vector constructed by 
previous iterations of GreedyStep(yl U OPT,x, b). Formally, we associate the vector zj G Z with 
the jth execution of GreedyStep’s main loop (counted across all the iterations of DCGreedy). Note 
that \Z\ < s/e, since GreedyStep’s loop is executed at most s times for each of the 1/e iterations of 
DCGreedy. For each sample, the random string b can simply store jlAj random thresholds in [0,1]. 
For a given vector z, these thresholds can be used to round z to an integral indicator vector (a 
sample of 7?(z)) in order to estimate E[/(i?(z) U {e}) — f{R(z)]. 

Consider the /th time GreedyStep executes line 7, and suppose that for each element e G ^dUOPT 
we compute a weight We{zj,h), by using i random samples encoded by b to estimate R(zj), as in 
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GreedyStep. We say that t(;e(zj,b) is a good estimate if 

\w,{z„h) - E[/(i?(z,) U {e}) - /(i?(z,))]| < ^/(OPT) + I E[/(fi(z,) U {e}) - /(i?(z,))]. 

We say that b is good if all of the weights {we{zj,h): Zj G Z,e G AU OPT} are good estimates. 
Lemma C.3. The randomness b is good with probability at least 1 — 1/n. 

Proof: Let d = maxegy/(e) < /(OPT). Consider a weight We and let R\,... denote the 
independent random sets used to compute Wf, in line 7 of GreedyStep. For each i G [P\, let = 
f {RiU {e}) — f {Ri). Note that, by submodularity, Wg^i < d < /(OPT). We use the following version 
of the Chernoff bound. 

Lemma C.4 (Lemma 2.3 in [3]). Let Xi,... ,Xm be independent random variables such that for 
each i, Xi G [0,1]. Let X = A Xi and p, = E[X]. Then 

Pr[X > (1 + a)p, + /3] < exp 

Pr[X < (1 - a)n - /3] < exp , 

If we choose an appropriately large constant in the definition of i, then setting m = i, Xi = 
Wg^i/d, a = e/2, and (3 = e/2s in Lemma fC.41 we obtain that Wg is a good estimate with probability 
at least 1 — Xjn^ > 1 — e/(sn^). The size of Z is at most s/e and for each element of Z, there are 
at most n weights to be estimated, so the lemma follows by the union bound. □ 

Now, note that if some random string b is good, then all weights calculated by DCGreedy(^ U 
i?,b) in are good also, since AVJ B T AVJ OPT, and, as we have noted, the consistency property 
implies that GreedyStep picks the same set on inputs (^ U B,x,h) and (A, x, b) in each iteration. 
We now fix some good b, and show that for any B C OPT \ A, we must have: 

F(DCGreedy(^ U .B, b)) > a • f{{A U B) n OPT). (11) 

Where a = 1/e — 0(e) for non-monotone functions and a = 1 — 1/e — 0(e) for monotone functions. 
When / is monotone, this follows from previous work [3]. Thus we focus on the non-monotone case. 
This will finish the proof Lemma 1C.21 

Let N = A L) B for some B C OPT \ A and consider the restricted maximization problem 
inaxscN,Sel f(S)- We will need the following two lemmas from previous work. The first lemma is 
well-known and it follows from the exchange property of a matroid (see for example |25|). 

Lemma C.5. Let A4 = (N,I) he a matroid and let Bi,B 2 GT he two bases in the matroid. There 
is a bijection tt : Bi ^ B 2 such that for every element e G Bi we have Bi \ {e} U {' 7 r(e)} G T. 

Lemma C.6 f|15|l. Consider a vector x £ [0,1]^. Assuming Xg < a for every e G V, then for 
every set S T N, F(x V I 5 ) > (1 — a)f(S). 

Now, we begin by showing that DCGreedy improves the current solution by a large amount in 
each step. 

Lemma C.7. Suppose that the randomness b is good. In each iteration t of DCGreedy, F(x(t)) — 
F(x(t - 1)) > e(l - e)((l - e)V(iV C OPT) - F(x(t))) - e^f(OPT). 
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Proof: Fix an iteration t, and for brevity denote x = x(t — 1), x' = x(t). Let W be the set of 
elements selected by the GreedyStep for this update, and let y be the associated update vector. 
We suppose without loss of generality that \W\ = s, where s is the rank of the matroid I, since if 
|VF| < s we can simply add s — |LF| dummy elements to W. Let e* be the ith element added to W 
by GreedyStep and let y{i) be the value of y after i elements have been added to W. 

By Lemma fC.51 there is a bijective mapping vr : A^nOPT —)• W' between N nOPT and a subset 
W C LF of size |A^ n OPT| such that, for each element o € n OPT, W \ {7r(o)} U {o} £ X. For 
each i G [s], let Oj := 7r“^(ej) if G W and o* := Cj otherwise. 

For each i, we have Wf^^ > 'Wo^^ since o* is a candidate element during the iteration of GreedyStep 
that picked e^. Thus, since all the weights are good estimates, we have 


E[/(i?(x + y(i - 1)) U {e*}) - /(i?(x + y(i - 1)))] 

> (1 - e) E[/(i?(x + y(i - 1)) U {oj}) - /(i?(x + y(z - 1)))] - ^/(OPT). (12) 

for all y and %. Then, we have: 


F(x') - F(x) = F(x + y) - F(x) 


i=\ 


i=l 


dF 

dXei 


x+y(*-i) 


= + - 1)) U {ej) - f{R{x + y{i - 1)))] 

i=l 

- X] ^ E[/(i?(x + y(i - 1)) U {oj) - /(i?(x + y{i - 1)))] - J/(OPT)) 

2=1 

> ^ e ((1 - e) E[/(i?(x') U {o,}) - /(i?(x'))] - J/(OPT)) 

2=1 

> e(l - e)(F(x' V livnOPT) - T(x')) - eV(OPT), 


(13) 


where the first inequality follows from (fT^ and the last two from submodularity. 

We relate the value T(x' V Iathopt) to /(OPT) using Lemma IC.61 At each step, we increase 
each coordinate e of x by at most e(l — Xe{t)). Thus, for any step 0 < j < 1/e, we have 


XeU + 1) - XeU) < (1 - Xe{j))e, 


or, equivalently. 


XeU + 1) - (1 - UxeU) < e- 


Thus, for each time step t < 1/e, we have 


.(t)< 

1 = 1 


l-(l-e)‘ 
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By combining the inequality above with Lemma IC.61 we obtain 

F(x(t) V IjvnOPT) > (1 - eYfiN n OPT). 

Plugging this bound into (11311 then completes the proof. □ 

Lemma C.8. Suppose that the randomness b is good. The final solution x(l/e) eonstrueted by 
DCGreedy(A^,b) satisfies F(x(l/e)) > (1/e — e)f{N n OPT) — e/(OPT). Therefore the integral 
solution S satisfies f{S) > (1/e — e)f{N n OPT) — e/(OPT). 

Proof: By rearranging the inequality from Lemma IC.71 we obtain 

e(l - efi+^fiN n OPT) + F(x(t - 1)) - e2/(OPT) 


F(x(t)) > 


1 + e 


> e(l - e)*+V(^n OPT) + (1 - e)F(x(t - 1)) - e'^f{OFT). 
It follows by induction that 

F(x(t)) > te(l - eY+^f{N n OPT) - te^f {OPT). 


Thus 

F(x(l/e)) > (1 - e)H2/(iV n OPT) - e/(OPT) > Q - e^ f{N n OPT) - e/(OPT). 

□ 

Combining Lemmas 1C.31 and 1C.81 then complete the proof of Lemma 1C.21 

D Improved Analysis of the Two-Round Algorithm for Non-monotone Maxi¬ 
mization with a Cardinality Constraint 

In this section, we show that for a cardinality constraint, we can improve the analysis slightly of 
the algorithm given in Subsection 17.11 

Theorem D.l. If I is a cardinality constraint, the two-round algorithm from Subsection 17.11 
achieves a {l — ^ approximation in expectation for non-monotone maximization, where 

fi is the approximation guarantee of Mg. 

Proof: The analysis is similar to the one in the proof of Theorem 17.11 and we describe the main 
changes in this section. We define O as before, and modify the analysis of the solution Si as follows. 
Let Sf be the subset of Si consisting of the first j elements picked by Greedy, with S^ = 0. By the 
standard analysis of the Greedy algorithm for a cardinality constraint, for each j £ [k], we have 

and therefore 

fc-i \ k-l-j 

1=0 ^ 2 

Now, using = E[l^i] + E[lo] = + Iqpt ~ P) and Lemma (3^ we obtain: 

Elf(S( U O)] > r(E[lgi] + loPT - P). 
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Therefore 


k-l 


j=0 


k 


k-l-j 


E[/(5i)] > ^ - (1 - - ) /-(E[l5,] + loPT - P). 




(14) 


We analyze the expected value of the solution T as before, obtaining (j4]) from page 1 131 By combining 
(fTTl) and dH, we get 


E[/(S.)] + i ( 1 - (l - i)'j E[/(T)] > g i (l - ly ‘ ' (/-(E|l,,| + loPT - P) + /-(P 

'E[l^j] + 1 opt\ 


k-l-j 


j=0 

k-l 




j=0 


k 


k-l-j 


2-r 


where the last inequality follows from the convexity of / . Since Sf C Vi and Vi is a 1/m sample 
of V, we have E[l^j] < ^ • ly- Therefore, using the definition of f~ and the non-negativity of /, 
we obtain 


2 -/ 


_ IqPt' 


> I 1 - - I /(OPT). 


and thus 


/3 


E[/(5i)] + « 1 1 - { 1 - ^ ) I E[/(r)] >(1-{1--) ){1--) /(OPT). 


k 


m 


It follows that 


max{E[/(5i)],E[/(r)]}> (l-- 


> ( 1 - 

m 


i\k 


n 


i + l(i-(i-i)‘) 


/(OPT) 


1 - 


1 + Mi-i) 


tt/(opt). 


□ 
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